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Abstract 

Wyner's common information was originally defined for a pair of dependent discrete random variables. Its 
significance is largely reflected in, hence also confined to, several existing interpretations in various source coding 
problems. This paper attempts to both generalize its definition and to expand its practical significance by providing a 
new operational interpretation. The generalization is two-folded: the number of dependent variables can be arbitrary, 
so are the alphabet of those random variables. New properties are determined for the generalized Wyner's common 
information of dependent variables. More importantly, a lossy source coding interpretation of Wyner's common 
information is developed using the Gray-Wyner network. In particular, it is established that the common information 
equals to the smallest common message rate when the total rate is arbitrarily close to the rate distortion function with 
joint decoding. A surprising observation is that such equality holds independent of the values of distortion constraints 
as long as the distortions are within some distortion region. Examples about the computation of common information 
are given, including that of a pair of dependent Gaussian random variables. 

Index Terms 

Common information, Gray-Wyner network, rate distortion function 

I. Introduction 

Consider a pair of dependent random variables X and Y with joint distribution p{x, y), which denotes either the 
probability density function if X and Y are continuous or the probability mass function if X and Y are discrete. 
Quantifying the information that is common between X and Y has been a classical problem both in information 
theory and in mathematical statistics [l]-[4]. The most widely used notion is Shannon's mutual information, defined 

as 

I{X;Y)=E 



log- 



■ p{x)p{y) 
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where p{x) and p{y) are the marginal distribution of X and Y corresponding to the joint distribution p{x, y) and 
E[] denotes expectation with respect to p{x, y). Shannon's mutual information measures the amount of uncertainty 
reduction in one variable by observing the other. Its significance lies in its applications to a broad range of problems 
in which concrete operational meanings of I{X; Y) can be established. These include both source and channel coding 
problems in information and communication theory [5] and hypothesis testing problems in statistical inference [6]. 

Other notions of information have also been defined between a pair of dependent variables. Most notable among 
them are Gacs and Komer's common randomness K{X, Y) [2J and Wyner's common information C{X, Y) [4]. 
Gacs and Komer's conmion randonmess is defined as the maximum number of coimnon bits per symbol that can 
be independently extracted from X and Y. Quite naturally, K{X, Y) has found extensive applications in secure 
conmiunications, e.g., for key generation [7J-[9J. More recently, a new interpretation of K{X, Y) using the Gray- 
Wyner source coding network was given in [10]. It was noted in [2], [11] that the definition of K{X,Y) is rather 
restrictive in that K{X, Y) equals in most cases except for the special case when X = (X', V) and Y = {Y' , V) 
and X' , Y' , V are independent variables or those {X, Y) pair that can be converted to such a dependence structure 
through relabeling the reaUzations, i.e., whose distribution is a permutation of the original joint distribution matrix. 
Notice also that K{X, Y) is defined only for discrete random variables. 

Wyner's conmion information was originally defined for a pair of discrete random variables with finite alphabet 

as 

C{X,Y)= inf I(X,Y;W). (1) 

X-W-Y 

Here, the infimum is taken over all auxiliary random variables W such that X, W, and Y form a Markov chain. 
Clearly, the quantity C{X,Y) in (1) can be defined for any pair of random variables with arbitrary alphabets. 
However, the operational meanings of C{X, Y) available in existing literature are largely confined to that for discrete 
X and Y . These include the minimum common rate for the Gray-Wyner lossless source coding problem under a 
sum rate constraint, the minimum rate of a common input of two independent random channels for distribution 
approximation [4], and strong coordination capacity of a two-node network without conmion randonmess and with 
actions assigned at one node [12]. 

This paper intends to generalize Wyner's common information along two directions. The first is to generahze it to 
that of multiple dependent random variables. The second is to generalize it to that of continuous random variables. 

For the first direction, Wyner's common information is defined through a conditional independence structure 
which is equivalent to the Markov chain condition for two dependent variables. Relevant properties related to this 
generalization are derived. In addition, we prove that Wyner's original interpretations in [4] can be directly extended 
to that involving multiple variables. Note that both mutual information and common randomness have also been 
generalized to that of multiple random variables [14]-[16]. 

For the second direction, we provide a new lossy source coding interpretation using the Gray-Wyner network. 
Specifically, we show that, for the Gray-Wyner network, Wyner's common information is precisely the smallest 
common message rate for a certain range of distortion constraints when the total rate is arbitrarily close to the rate 
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distortion function with joint decoding. As the common information is only a function of the joint distribution, 
this smallest common rate remains constant even if the distortion constraints vary, as long as they are in a specific 
distortion region. There has also been recent effort in characterizing the common message rate for lossy source 
coding using the Gray-Wyner network [17]. We establish the equivalence between the characterization in [17] with 
an alternative characterization presented in the present paper. 

Computing Wyner's common information is known to be a challenging problem; C{X, Y) was only resolved 
for several special cases described in [4], [13]. Along with our generalizations of Wyner's common information, 
we provide two new examples where we can explicitly evaluate the common information of multiple dependent 
variables. In particular, we derive, through an estimation theoretic approach, C{X,Y) for a bivariate Gaussian 
source and its extension to the multi-variate case with a certain correlation structure. 

The rest of the paper is organized as follows. Section 11 reviews Wyner's two approaches for the common 
information of two discrete random variables, the general Gray-Wyner network, and the relations among joint, 
marginal, and conditional rate distortion functions. Section III gives the definition of Wyner's common information 
for N dependent random variables with arbitrary alphabets along with some associated properties. The operational 
meanings of Wyner's conomon information developed in [4] are also extended to that of N discrete dependent 
random variables in Section III. In Section IV, we provide a new interpretation of Wyner's conomon information 
using Gray- Wyner's lossy source coding network. Specifically, we prove that for the Gray-Wyner network, Wyner's 
conomon information is precisely the smallest conomon message rate for a certain range of distortion constraints 
when the total rate is arbitrarily close to the rate distortion function with joint decoding. In Section V, two examples, 
the doubly synometric binary source and the bivariate Gaussian source, are used to illustrate the lossy source coding 
interpretation of Wyner's conomon information. The conomon information for bivariate Gaussian source and its 
extension to the multi-variate case is also derived in V. Section VI concludes this paper. 

Notation: Throughout this paper, we use calligraphic letter X to denote the alphabet and p{x) to denote either 
point mass function or probabiUty density function of a random variable X. Boldface capital letter X"^ denotes a 
vector of random variables {Xi^i^A where A is an index set. A\B denotes set theoretic subtraction, i.e., A\B = 
{x : X G A and x ^ B}. For two real vectors of identical size x and y, x < y denotes component- wise inequaUty. 



A. Wyner's result 

Wyner defined the conomon information of two discrete random variables X and Y with distribution p{x, y) in 
equation (1) and provided two operational meanings for this definition. The first approach is shown in Fig. 1. This 
model is a source coding network first studied by Gray and Wyner in [18]. In this model, the encoder observes a 
pair of sequences (X", F"), and map them to three messages Wq, Wi,W2, taking values in alphabets of respective 
sizes 2"^o^ 2"^i and 2"^^ Decoder 1, upon receiving (Wq, Wi), needs to reproduce X" with high reliability while 
decoder 2, upon receiving {Wo, W2), needs to reproduce Y" with high reUabiUty. Define 



II. Existing Results 
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Fig. 1. Source coding over a simple networlc. 
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Fig. 2. Random variable generators. 



where rf^f (•, •) is the Hamming distortion. Let Ci be the the infimum of all achievable i?o for the system in Fig. 1 such 
that for any e > 0, there exists, for n sufficiently large, a source code with the total rate i?o + -Ri+^2 < H{X, Y)+e 
and A < e. 

The second approach is shown in Fig. 2. In this approach, the joint distribution p(x",y") = nr=i y«) 
approximated by the output distribution of a pair of random number generators. A common input W, uniformly 
distributed on W = {1, • • • , 2"^"} is sent to two separate processors which are independent of each other These 
processors (random number generators) generate independent and identically distributed (i.i.d) sequences according 
to two distributions qi{x"\w) and g2(y"|w) respectively. The output sequences of the two processors are denoted 
by X" and respectively and the joint distribution of the output sequences is given by 

Let 

Let C2 be the infimum of rate Rq for the conmion input such that for any e > 0, there exists a pair of distribtions 
qi{x"'\w), q2{y"'\w) and n such that Dn{q,p) < e. 
Wyner proved in [4] that 

Ci=C2=C{X,Y). 

B. Generalized Gray-Wyner networks 

Consider the Gray-Wyner source coding network [18] with one encoder and N decoders as shown in Fig. 3. The 
encoder observes an i.i.d. vector source sequence {Xi, • • • , X„} where each Xfe = {Xik, ■■■ , X^k}, k = I,- - ■ ,n, 
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Fig. 3. Generalized Gray-Wyner source coding network. 



is a length- vector with joint distribution p(x). Denote by X" = [Xn, ■ ■ ■ , Xin] the ith component of the vector 
sequence. There are a total of N receivers, with the ith receiver only interested in recovering the ith component 
sequence X". The encoder encodes the source into N + 1 messages, one is a public message available at all 
receivers while the other A'^ messages are private messages only available at the corresponding receivers. 
For m = 1, 2, • • • , let /to = {0, 1, 2, • • • , m — 1}. An (n, Mq, Mi, • • • , Mjv) code is defined by 

• An encoder mapping 

/ : Af" X • • • X — )• Imo X Imi x • • • Imm > 

• N decoder mappings 

gi : Im, X Imo XJ" , i = l,2,---,N. 

For an (n, Mq, Mi, • • • ,Mn) code, let /(Xi,-- - ,X„) = {Wo,Wi,--- ,Wn) and Xf = gi{Wi,Wo), i = 
1,2,--- ,7V. 

We discuss below the lossless and lossy source coding using the generalized Gray-Wyner network. 
1) Lossless Gray-Wyner source coding: 
Define the probability of error as 

1 ^ 

Pi''^ = :^Y.^[dH{x-at)], (2) 

where X" = gi{Wi,Wo) for i = 1, • • • , A'' and duiu"', w") is the Hamming distance between u" and u". 

A rate tuple (i?o, Ri,-" ,Rn) is said to be achievable if for any e > 0, there exists, for n sufficiently large, an 
(n, Mo, Ml, • • • , Mjv) code such that 

Mi < 2"(^^+'^\ i = Q,l,---,N, (3) 
Pi") < e. (4) 

Denote by TZi the region of all achievable rate tuples (i?o, Ri,-" > Rn)- 
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Theorem 1: TZi is the union of all rate tuples (i?o, • • • ,Rn) that satisfy 

Ro > I{Xi,--- ,Xn;W), (5) 
Ri > H{Xi\W), i = l,2,---,N, (6) 

for some W p{'w\xi,- ■ ■ ,X]sr). 

2) Lossy Gray-Wyner source coding: 
Let d(x, x) = {di{xi,xi), ■ ■ ■ , djv(a;jv, ^jv)} be a compound distortion measure. Define Aj, i = 1, • • • , A'' to be 
the average distortion between the ith component sequence of the encoder input and the ith decoder output, 

1 " 

Ai = E[di{X^, XI')] = -J2 Eidii^ik, Xik)]. (7) 

Define the vector of average distortions to be A = {Ai, • • • , Ajv}- An (n, Mq, Mi, • • • , Mjv) code with an aver- 
age distortion vector A is said to be an (n, Mq, Mi, • • • , Mjv, A) rate distortion code. Let D = {Di, D2, ■ ■ ■ , -Djv} € 
M^. A rate tuple (i?o, Ri,-" i Rn) is said to be D-achievable if for arbitrary e > 0, there exists, for n sufficiently 
large, an (n, Mq, Mi, • • • , Mjv, A) code such that 

Mi < 2"(^'+'), i = 0,l,---,N, (8) 

A < D + e. (9) 

Let 72.2(D) be the region of aU D-achievable rate tuples {Ro, Ri, - ■■ , Rn)- 
Theorem 2: 72.2(D) is the union of aU rate tuples (i?o, ^^i, • " " > Rn) that satisfy 

Ro > liXi,--- ,Xn;W), (10) 

Ri > Rx,\w{Di), i = l,2,---,N, (11) 

for some W ~ p{w\xi, ■ ■ ■ ,Xn)- 

Here, Rxi\w{Di) is the conditional rate distortion function defined as [21] 

RxM^i)= IiXi;Xi\W). (12) 

pt{xi\xi,w):Edi{Xi,Xi)<Di 

Theorems 1 and 2 are direct extensions of Theorem 4 and 8 in [18] for Gray-Wyner network with two receivers. 
Note that in [18], the authors proved only the discrete case for [18, Theorem 8], the proof for continuous alphabets 
can be constructed in a similar fashion. 

C. Joint, marginal and conditional rate distortion functions 

In this section, we review the joint, marginal and conditional rate distortion functions and their relations. Two- 
dimensional sources will be considered and the results can be generahzed inmiediately to A'^-dimensional vector 
sources. 

Given a two-dimensional source {Xi,X2) with probabiUty distribution p{xi,X2) and two distortion measures 
di{xi,xi) and d2{x2, X2) defined on Xi x Xi and X2 x X2, the joint rate distortion function is given by 

Rx,xADi,D2) =mmI{XiX2;XiX2), (13) 
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where the minimum is taken over all test channels Pt{xiX2\xiX2) such that Edi{Xi,Xi) < Di, Ed-iiX^^X^) < 
£>2- The conditional rate distortion function is defined in (12). The joint, marginal and conditional rate distortion 
functions satisfy the following inequalities. 

Lemma 1: [19], [20] Given a two-dimensional source {Xi,X2) with joint distribution p{x\,X2) and two 
distortion measures xi), d2{x2, X2) defined respectively on Xi x Xi and X2XX2, the rate distortion functions 



satisfy the following inequalities 

Rx,xADi,D2) > Rx,\xADi) + RxAD2), (14a) 

Rx^ixADi) > RxADi) - I{Xi;X2), (14b) 

Rx,xADi,D2) > Rx^iDi) + RxAD2) - HXi;X2), (14c) 

Rx^{Di)>Rx,\x,{Di), (15a) 

RxADi) + Rx,{D2) > Rx,xADi,D2). (15b) 



Sufficient conditions for equality in (14) are that the optimum backward test channels for the functions on the 
left side of each equation factor appropriately, i.e., for (14a) Pb{xiX2\xiX2) = p{xi\xiX2)p{x2\x2), for (14b) 
Pb{xi\xiX2) = p{xi\xi) and for (14c) that Pb{xiX2\xiX2) = p(a;i|f i)p(a;2|:E2)- Equalities hold in (15) if and only 
if Xi and X2 are independent. 

Furthermore, Gray has shown that under quite general conditions, equalities hold in (14) for small values of 
distortion. This is because the marginal, joint and conditional rate distortion functions equal to their Extended 
Shannon Lower Bounds (ESLB) [19], [21] under suitable conditions. These ESLB, denoted by r'^\d) for a rate 
distortion function Rx{D), satisfy the following property. Denote by X> a surface in the m-dimensional space and 
the inequaUty A < D means that there exists a vector p gV such that A < ^. If there is no such a vector, A > D. 
Likewise, Vi < D2 means that /3 < P2 for any /3 e X»i [19]. 

Lemma 2: [19] Given a two-dimensional source (Xi, X2) with joint distribution p{x\,X2) such that for x\ G 
Xi,x2 € X2, p{x2\xi) > 0, reproduction alphabets Xi = Xi, X2 = X2 and two per-letter distortion measures 
di{xi,xi) and ^2 (2:2, 2:2) that satisfy 

di{xi,Xi) > di{xi,Xi) = 0,Xiy^Xi,i = l,2, (16) 

there exist strictly positive surfaces 'D{XiX2), 'D{Xi\X2), and 'D{X2) such that 

Rx,xADi,D2) = R^xlxSDi^D2), if (-Di,i^2) < V{X^X2), 

Rx,\xADi) = R'x!\xSDi), if D, < V{X,\X2), 

RxADi) = R''x!(Di): if Di < V{Xi), 

Rx, {D2) = R^xl {D2), if D2 < V{X2), 

and 

V{X^\X2) < V{X{), 

V{XiX2) < {V{Xi\X2),V{X2))<{V{Xi),V{X2)). 
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Finally, 

R'xLiDuD2) = R^^l^^^{D,)+R^^l{D2), (17) 

= R^^l{D^) + R^^l{D2)-I{Xv,X2). (18) 
It is apparent that when the rate distortion functions equal to their corresponding ESLB, equations (17) and (18) 
imply equahties in (14a), (14b) and (14c). 

III. The Common Information of N Dependent Discrete Random Variables 
A. Definition 

Wyner's original definition of the common information in (1) assumes a Markov chain between the random 
variables X, Y and the auxihary variable W, i.e., X — W — Y . This Markov chain is equivalent to stating that X 
and Y are conditionally independent given W. This conditional independence structure can be naturally generalized 
to that of N dependent random variables. Let X = {Xi,--- ,Xj^} be N dependent random variables that take 
values in some arbitrary (finite, countable, or continuous) spaces Afi x x • • • x Xj^. The joint distribution of X 
is denoted as }5(x), which is either a probability mass function or a probability density function. We now give the 
definition of the common information for N dependent random variables. 

Definition 1: Let X be a random vector with joint distribution p(x). The connmon information of X is defined 

as 

C(X) ^inf/(X;W^), (19) 

where the infimum is taken over all the joint distributions of (X, W) such that 

1) the marginal distribution for X is p(x), 

2) X are conditionally independent given W, i.e., 

N 

p{x\w) = JJp(a;j|w). (20) 

i=l 

We now discuss several properties associated with the definition given in (19). 

Wyner's common information of two random variables {Xi,X2) satisfies the following inequality 

I{XuX2) < C{X,,X2) < mm{H{X,),H{X2)}. 

A similar inequality for the common information of A?^ random variables can be derived. Let A C J\f = {1,2, - ■ ■ , N} 
and A = M\A. We have 

max{/(X^;X'^)} < C(X) < min{i?(X-^)}, (21) 

A j 

where X"^' ^ X-^\{J> ={Xi, ■■■ , Xj_i,Xj+u- ■ ■ , Xn} for j e Af. 

To verify the upper bound, for any j G J\f, let Wj = X~^. Thus, Xi,-- ■ , X^ are conditionally independent 
given Wj, and 

/(X; Wj) = /(X;X-^ ) = iJ(X-^ ). 
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Thus C(X) < iJ(X-J) for all j € Af. 

For the lower bound, since Xi, • • • , Xj^ are conditionally independent given W, we have the Markov chain 
- - for any subset A(Zj\f. Hence, 

/(X; W) > /(X"^; H^) > /(X^; X^), 

where the second inequahty is by the data processing inequality. 
Therefore, 

7(X; W) > niax{7(X^; X^)}. (22) 

The common information defined in (19) also satisfies the following monotone property. 

Lemma 3: Let X ~ p(x). For any two sets A, B that satisfy A C B C = {1, 2, • • • , N}, we have 

C(X^) < C(X^), (23) 
Proof: Let W be the auxiliary variable that achieves C(X^), i.e., /(X^;VK') = inf^/ /(X^; H^). Since 
Acs, X^ being conditionally independent given W imphes that X"^ are conditionally independent given W . 
Thus 

/(X^;W^') > /(X^;W)> 

> inf7(X^;W"), 

where the infimum is taken over all W such that X^ is independent given W. ■ 
The above monotone property of the common information is contrary to what the name implies: conceptually, the 
information in common ought to decrease when new variables are included in the set of random variables. Such is 
the case for Gacs and Korner's common randomness, i.e., KiX."^) > K{X^). As a consequence, we have that for 
any N random variables C(X) > K(X.). The fact that the common information C(X) increases as more variables 
are involved suggests that it may have potential applications in statistical inference problems. This was explored in 
[22]. 

B. Coding theorems for the common information of N discrete random variables 

Section II-A describes two operational interpretations of Wyner's common information for two discrete random 
variables based on the Gray-Wyner network and distribution approximation. These operational interpretations can 
also be extended to the conamon information of N dependent random variables. 

For the first approach, we consider the lossless Gray-Wyner network with A'' terminals. For the Gray-Wyner 
source coding network, A number Rq is said to be achievable if for any e > 0, there exists, for n sufficiently large, 
an (n, Mq, Mi, • • • , Mjv) code with 



Mo < 2"-^", (24) 

1 ^ 

-V log Mi < 7f(X)+e, (25) 

Pi") < e. (26) 
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Define Ci as the infimum of all achievable Rq. 

Theorem 3: For N discrete random variables X with joint distribution p(x), 

Ci = C(X). (27) 
The proof of Theorem 3 is a direct extension of the proof for two discrete random variables in [4] and hence is 
omitted. 

The second approach of interpreting the common information of discrete random variable uses distribution 
approximation. Let {Xi,--- ,X„} be i.i.d. copies of X with distribution p(x), i.e., the joint distribution for 
{Xi,-- - ,X„} is 

n 

p(")(xi,--- ,x„)= (28) 
fe=i 

An (n, M, A) generator consists of the following: 

• a message set W with cardinality M; 

• for all w e W, probability distributions qf'\x^\w), for i = 1,2, - ■ ■ , N . 
Define the probabihty distribution on x x • • • x X'^ 

9(")(xi,... ,x„)= ^ l-^qt'\x^\w). (29) 
wew i=i 

Let 

A = Z)„(g(")(xi,... ,x„);p(")(xi,... ,x„)) =^ig(")(xi,... ,x„)log C^i'''' ' " ' """^ , (30) 

where p("^(xi, • • • ,x„) is defined in (28) and ^("^(xi, • • • ,x„) is defined as in (29). 

A number R is said to be achievable if for all e > 0, if for n sufficiently large there exists an (n, M, A) generator 
with M < 2"^ and A < e. Define C2 as the infimum of all achievable R. 

Theorem 4: For N discrete random variables X with joint distribution p(x), 

C2 = C(X). (31) 
The proof can be constructed in the same way as that of [4, Theorems 5.2 and 6.2], hence is omitted. 

IV. The Lossy Source Coding Interpretation of Wyner's Common Information 

The common information defined in (1) and (19) equally applies to that of continuous random variables. However, 
such definitions are only meaningful when they are associated with concrete operational interpretations. In this 
section, we develop a lossy source coding interpretation of Wyner's common information using the Gray-Wyner 
network. While this new interpretation holds for the general case of N dependent random variable, we elect to 
present coding theorems involving only a pair of dependent variables for ease of notion and presentation. 
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A. Lossy Gray-Wyner source coding 

Given a two-dimensional source {Xi,X2) ^ p{xi,X2), for any {Di,D2) > 0, a number Rq is said to be 
{Di, D2)-achievable if for any e > 0, there exists, for n sufficiently large, an (n, Mq, Mi, M2, Ai, A2) code with 

Mo < 2"-^°, (32) 

V- log Mi < Rx,xADi,D2) + e, (33) 

Ai < L»i + e , A2 < £'2 + e. (34) 

Define C3(£>i, £'2) as the infimum of all i?o's that are (£>i, Z)2)-achievable. Thus, €3(01,02) is the minimum 
common message rate for the Gray-Wyner network with sum rate iJ^iXaC-Di, -D2) while satisfying the distortion 
constraint. Since -R^iXa (^i) -C2) is always (Di, £)2)-achievable, it is obvious that 

C3{Di,D2)<Rx^xADi,D2). (35) 

The following theorem gives a precise characterization of C3(£'i, D2). 
Theorem 5: 

C3{Di,D2) = C{DuD2), (36) 
where C{Di,D2) is the solution of the following optimization problem: 

inf I{Xi,X2]W) (37) 

subject to Rx,\w{Dt) + Rx^\w{D2) + I{Xi,X2;W) = Rx,xADi,D2). 
Proof: See Appendix A. ■ 
The authors in [17] gave an alternative characterization of C3{Di,D2). Define 

C*{D,,D2) = m{I{X,,X2;W), 

where the infimum is taken over all joint distributions for Xi,X2,X*, X2, W such that 

XI -W - X*, (38) 

{Xi,X2)-{XlX*2)-W, (39) 

where {XI, X^) achieves Rx^xAI^i,D2). It was shown in [17] that C3{Di,D2) = C*{Di,D2). This, combined 
with Theorem 5, establishes that 

C{Di,D2) = C*{DuD2). (40) 

C{Di,D2) is derived from the rate distortion region TZ2{Di, D2) given in Theorem 2 while the authors in [17] 
chose to derive C*{Di, D2) from an alternative characterization of TZ2{Di, D2) given in [23]. In Appendix B, 
we provide a direct proof of (40) for completeness. Also, as given in Appendix B, a necessary condition for the 
equality condition in the optimization problem (37) is 

RxiX2\wiDi,D2) = Rxi\w{Di) + Rx2\w{D2)- 
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B. The relation of C^{Di,D2) and the common information 

Given our characterization of Cz{Di, D2) in Theorem 5, we now establish its connection with C{Xi,X2) which 
leads to a new interpretation of Wyner's common information. We begin with the following two lemmas. 
Lemma 4: Let W be the random variable that achieves the common information of Xi and X2. If 

Rx,x,\w{DuD2) + C{XuX2) = Rx^xADuD2), 

then 

^3(^1,^2) <C(Xi,X2). (41) 
Lemma 4 is a direct consequence of Theorem 5 as the Markov chain Xi — W — X2 implies X2\w{Di, -D2 ) = 
Rxi\w{Di) + Rx2\w{D2)- Thus, the equaUty constraint in (37) is satisfied. Inequahty (41) follows as 

C{DuD2) = C3{DuD2) < I{X,,X2;W) = C{XuX2). 

The next lemma gives a sufficient condition under which C3{Di, D2) > C(Xi, X2) is true. 
Lemma 5: For any distortion pair (Di, D2), if the rate distortion function satisfies 

Rx,X2iDi,D2) = RxdDi) + RxADi) - I{Xi;X2), (42) 

then we have 

C3{D„D2)>C{X,,X2). 

Proof: See Appendix C. ■ 
Lemmas 4 and 5, together with the relations of marginal, joint and conditional rate distortion functions described 

in Lemmas 1 and 2, allow us to detemiine a region such that Cs{Di, D2) equals to the common information. 
Theorem 6: Let random variables Xi , X2 be distributed as p{xi ,X2) on Afi x A2 such that for xi € X-i,X2 G X2, 

p{x2\xi ) > 0. Let the reproduction alphabets Afi = Afi, A2 = Af2. The two per-letter distortion measures di {xi ,xi), 

d2{x2,X2) satisfy 

di{xi,Xi) > di{xi,Xi)=0, Xi^Xi,i= 1,2. (43) 

Then there exists a strictly positive surface 7 = (71,72) such that, for {Di,D2) < 7, 

C3{Di,D2)=C{X,,X2). (44) 
Proof: See Appendix D. ■ 
Theorem 6 shows that Wyner's common information is precisely the smallest common message rate C3{Di, D2) 
of Gray-Wyner network for a certain range of distortion constraints when the total rate is arbitrarily close to the 
rate distortion function with joint decoding. As the common information is only a function of the joint distribution, 
hence is a constant for a given p{xi,x2), it is surprising that the smallest connmon rate C3(Z)i, 1)2) remains constant 
even if the distortion constraints vary, as long as they are in a specific distortion region. 

While Theorem 6 establishes that Cz{Di,D2) = C{Xi,X2) for {Di,D2) < 7, it does not specify the value 
of the positive distortion vector 7. Let T>'^ = {01,02) be the two-dimensional distortion surface such that 
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RxiX2{Dl,D2) = C{Xi,X2), then we must have that 7 < V. This is because if 7 > 2?°, then there exists 
iD,,D2) such that 7 > pi, ^2) > and C3(Di,^2) < i?XiX.(-Di, ^2) < Rx.xADl.D^^) = C(Xi,X2), 
which contradicts Theorem 6. Now let us consider a particular point on the surface V. Let W be the auxiUary 
random variable that achieves C(Xi, X2). Suppose there exists a distortion pair D2) satisfying, for i = 1,2, 

Rxm) = I{X,:W), ^^^^ 
D° = inf,,(„)iJd,(Xi,jeo(W)), 
where Xi{w),X2{w) are deterministic functions. Under this assumption, we can show that _RxiX2 (-Di, £^2) — 
I{Xi . X2: W). Therefore, the joint rate distortion function RxiX2 {Di, D2) not only equals to the common infor- 
mation but also is achieved by the auxiliary random variable W. Furthermore, it is easy to show 

C3{D^„D^2) = C{X,,X2), (46) 

using Lemma 5 and the fact that 03(0 1,02) < -RxiX2 (-C? , i'^)- This means that in the Gray-Wyner network, 
with the total rate equal to RxiX2 i^i, D2), the scheme to transmit the pair of sources {X", X^ ) within distortion 
constraints (I?]*.!)") is to communicate W to the two receivers using the common channel. 

Let us now decrease the distortion constraints from (L*", Dj) to (Di, D2) < (-D?, -D^)- The question is whether 
the rate C{Xi,X2) is (Di, 1)2)— achieveble, i.e., if it is possible to transmit the sources {Xl\X2 ) with smaller 
distortions (Di,D2) with the sum rate at -RxiXs (-Di, -D2) while keeping the common rate at C{Xi,X2)- In the 
following, we identify a sufficient condition for C3,{Di, D2) = C{Xi,X2) for successively refinable sources. A 
source X with distortion measure d{x, x) is said to be successively refinable from a coarser distortion 5i to a finer 
distortion S2 (Si > 62) if it can be encoded in two stages in which the optimal descriptions at the second stage is 
a refinement of the optimal descriptions at the first stage [27]. Similar definition can be applied to vector sources 
with individual distortion constraints and the details can be found in [30]. 

In the following theorem, we give a sufficient condition under which C3{Di, D2) — C{Xi,X2) for any 
{Di,D2) < (-DJj-D")- This sufficient condition ensures the optimality of a two-stage encoding scheme: first 
encode the common message with rate C(Xi, X2) and we can obtain a coarse distortion (Dj, I^a), then encode the 
two private messages with rates Rxi\w i^i) and Rx2\w{D2)- The successive refinement assumption guarantees 
that the two-step approach can achieve the distortion {Di,D2) and the sum rate does not exceed the total rate 
Rx,x2{Di,D2). 

Theorem 7: Let W be the auxiliary variable that achieves C(Xi, X2) and {p\, D") be a distortion pair satisfying 
(45). If the source (Xi.Xa) is successively refinable from {D\,D%) to {Di,D2) for (Di,I?2) < {D'i,D^), and 
Xi is successively refinable from to Di for Di < Df, i = 1,2, then, 

Cs{DuD2) = C{Xi,X2). 

Proof: See Appendix E. ■ 
In the following section, we will consider two examples involving successively refinable sources: the binary ran- 
dom variables and bivariate Gaussian variables. For these two cases, we compute expUcitly the function C3(Di, D2) 
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and establish its connection with C{Xi,X2). The distortion pair [D^^D^) satisfying (45) are identified for both 
cases, thus Theorem 7 can be directly applied. 

V. Examples 

A. Binary random variables 

Let S - Bem(6l) for < 6* < 1, i.e., 5* e {0, 1} and P{S = 1) = 6. Let X„ i 1, • • • , TV, be the output of 
a binary symmetric channel (BSC) with crossover probability ai (0 < ai < ^) and with S as input. The BSC 
channels are independent of each other. Thus, 

JV 

p(a;i,--- ,xn\s) = Wp{xi\s), 

where 



p{xi\s) 



(48) 



1 — ai, if Xi — s, 
fli, otherwise, 
for Xi e {0, 1}. Therefore, the joint distribution of Xi,X2, ■ ■ ■ , Xn is 

JV 

p{Xi,X2,--- ,Xn) = 51 Ks)n^^^*l^)' 

se{o,i} *=i 

= 6'a*i"(l-oi)^-*" +(l-6»)(l-ai)*"af-*", (47) 

where tw = Ylf=i ^i- 

For N = 2, the joint distribution of Xi,X2 is given by the following probability matrix, 

0{1 - ai)"^ + {1 - 0)al ai(l-oi) 

ai(l-ai) Oal + (1 - e)il - ai)^ 

It has been shown by Witsenhausen [13] that the conmion information of Xi,X2 is achieved with W being S. 
That is 

C{Xi,X2) = I{XiX2;S) = H{Xi,X2) - 2h{ai), (49) 

where h{-) is the binary entropy function. When ^ = |, {Xi,X2) is a Doubly Symmetric Binary Source (DSBS) 
whose common information was derived by Wyner [4] using a different approach. 
We now obtain the common information for A'' variables. 

Proposition 1: Let S Bem(0) and let Xi, i = 1, - ■ ■ ,N, he the output of independent BSCs with common 
input S and crossover probability < ai < 1/2. Then for any AT > 2, the common information for Xi, • • • , Xn 
is given as 

CiXi,--- ,Xm)=I{X^,--- ,Xm;S). (50) 
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Proof: That C(Xi, • • • , Xf^) < I(Xi, • • • , X^; S) follows from the definition of the common information 
(19). The inequaUty C{Xi, • • • , X^) > I{Xi, • • • , Xjv; S) can be proved by contradiction. Suppose there exists a 
W such that 

C(Xi,--- ,Xjv) = /(Xi,--- ,Xn;W) <I{Xi,--- ,Xn;S), (51) 

i.e., C{Xi,- ■ ■ , Xn) is achieved by W and it is strictly less than I{Xi, ■ ■ ■ , Xn; S). Since W induces conditional 
independence of Xi, • • • , Xjv, we have, from (51), 

N N 

Y,H{Xi\W)>Y,H{Xi\S). 

i=l j=l 

Thus, there must exist two random variables Xh,Xj, fc, j e {1, • • • , N} such that 

H{Xk\W) + H{X,\W) > H{Xk\S) + H{Xj\S). 

Given that the sequence {Xi, ■ ■ ■ , Xn} is exchangeable [31], p{xk,Xj) has the same joint distribution as p{xi,X2). 
Thus, 

C{Xi,X2) = C{Xk ,Xj) = I{Xk , Xj ■,W)<I{Xk, Xj ; S) = I{X, ,X2;S). 

This, however, contradicts the fact that S achieves C{Xi,X2). Thus the proposition is proved. ■ 
We now characterize the minimum common rate C'i{D\,D2) for a DSBS. 
Proposition 2: Consider a DSBS {X\,X2) with distribution 

J 5(1 -ao), if xi ==X2, 
p(xx,X2)=\ (52) 
I iflo, Otherwise, 

where, without loss of generality, < ao < 1/2. Let a\ be such that ao = 2ai(l — ai),0 < ai < 1/2. With 
Hamming distortion d\ =6,2 = dn, we have 

C(Xi,X2), (Di,D2) GflO, 

C3(£'i,£'2)= <( i?xix.(i?i,£'2), pi,D2) Gf2U£:3, (53) 
0, (Z)i,Z)2)>(ii), 

C(Xi,X2)<C73(r'i,£)2)<iiM(£'i,£'2), (r'i,£)2) efii, (54) 



where 



fio = {pi,Z?2) :0< A <ai,z = l,2}, 

£n = £^o^{(Di,D2) : Di+D2~2DiD2<ao}, 

£2 = £:fonffin{pi,i^2):max{^^,f^}<ao}, 

£3 = ffonffin£:|n{pi,i?2) : A < i« = i,2}. 

Proof: For ~ Bem(l/2), i = l,2 with Hamming distortion, the rate distortion function is 

1 - h{Di), < A < 5, 



(55) 



0, A > i 
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Rx,x,{Di,D2)=< 



The joint rate distortion function of the DSBS (Xi, X2) is given by [30] 

l + h{ao)-h{Di)-h{D2), {Di,D2)g£i, 
1 - (1 - ^o)h {^^^) - aoh (^^^f^) , {Du D2) e S2, (56) 
l-/i(niin{Di,£)2}), (Di, £(2) e £3- 

where £1 = £10 U £11 with £10, £11, £2 and £3 defined in (55). Therefore, for this DSBS, Rxi (-Di) + {D2) — 
I{Xi;X2) = RxixAD\,D2), for (-Di,£'2) e fi. From Lemma 5, we have for (-Di,£>2) e £1, 

C3{Di,D2)>C{XuX2). (57) 

On the other hand, the conditional rate distortion function Rxi\s{Di), i = 1,2, is given by [19] 

o / M«i)-MA), 0<A<ai, 

[ 0, Di>ai. 
Therefore, RxM^i) + ^x,|s(A) + I{Xi,X2;S) = Rx,x,{Di,D2) is satisfied for {Di,D2) G £10. By 
Theorem 5, C3(Di,D2) < C{Xi,X2) for (-Di,£'2) e ^lo- Together with (57) and given that £10 C £1, we have 
proved that for (Di, D2) G fio, 

C3{D„D2)=C{X„X2). 

For (Z)i,£)2) € ^2, we only need to show that C3{Di,D2) > RxiX2{Di, D2). It was shown in [30] that the 
backward test channel that achieves RxiX2 (^i) ^2) is given by 

Xi = Xi + Z\, 
X2 = X2 + Z2, 
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Cs{D,D) 



CiXi,X2) 



^x,xAD,D) 



ai 
Fig. 5. The relation of CsC-D, D) and D for the DSBS with Di = D2 = D. 



D 



where both Xi , X2 and Zi , Z2 are binary vectors independent of each other with the probabihty mass functions 
given respectively as 

2 - oo - -Di - -D2 D2-Di+ao 
Di- D2 + ao Di + D2- ao 

Therefore, {Xi,X2) that achieves RxiX2{Di,D2) satisfies 

X2 = Xi. 

For the characterization C*{Di, D2) of Cs{Dx, D2), any W satisfying the Markov chain Xi — W — Xi must satisfy 
H{Xi\W) = 0. Thus, Xi is a function of W and we have 

/(Xi, X2; W) = /(Xi, X2; W,Xi)> /(Xi, X2; li) = (£>!, i?2). 

Therefore, C3{Di,D2) = Rx,xADi,D2). 

The region £3 is a degenerated one. For example, RxiX^ (Di, D2) — Rxi (Di) if ao < '^l^j^l ^ind < = 
1,2. This implies that the optimal coding scheme is to ignore X2 and optimally compress Xi. Then X2 can be 
estimated from Xi with distortion less than D2- The case of ao < ^1_2D2 dealt with similarly. Hence, similar 
to the region £2, C^{Di,D2) = Rx^x^iDi, D2). ■ 

The characterization of Ca{Di, D2) is plotted in Fig. 4 as a function of the distortion constraints. C3{Di, D2) = 
C(Xi, X2) in the shaded region. For the symmetric distortion constraint, D\ = D2 = D, the relation of C3{D, D) 
and D for the DSBS is given in Fig. 5. 

Remarks: 

• The claim Cs{Di, D2) = C(Xi, X2) for {Di, D2) € £10 can also be proved using Theorem 7. Rx^x^ (ai, ai) 
is achieved by the backward test channel Pb{x\,X2\s) = p{x\\s)p{x2\s). The vector source {X\,X2) is 
successively refinable for any {D\,D2) < (ai,ai) [30] and the scalar source Xi is successively refinable for 
any A < ai, i = 1,2 [27]. Thus by Theorem 7, C3(£)i,£>2) = C{Xi,X2) for (£)i,£>2) < (ai,ai). 



X1X2 



Z1Z2 
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We have the full characterization of C^{Di,D2) in the distortion region except the region fn. From the 
proof of Proposition 2, we know that Cz{Di,D2) > C{Xi,X2) for (Di,I?2) € £11, but the exact value of 
Cz{Di, D2) in this region remains unknown. 

Let(£)i,L'2) < {D[,D'2) < (ai, ai), then the rate iixiXaC^'i, -D2) is (£>i,L'2)-achievableintheGray-Wyner 
network, i.e., Rx,xAD[, D'^) > ^3(^1,1)2). 

To show this, let {Xi,X2) achieve i?XiX2 (^'ij ^2)- The backward test channel that achieves RxiX2iD[, D'2) 
satisfies Pbixi,X2\xiX2) = Pb{xi\xi)pb{x2\x2) where 

1 - DU if Xi = Xi, 



PbiXi\Xi) = 



D[, Otherwise. 



for i = 1,2. Then for [Di,D2) < {D[,D'2) < (ai, ai), let the rate allocation of Rq, i?2 in the Gray-Wyner 
network be 

Ro = Rx,xAD'i,D'^) = 1 + h{ao) - h{D[) - h{D'^), 
Ri = Rx,\x,xSDi) = Rx,\xS^i) = KD'i) - KD,),i = 1,2. 

Since Ro,R\ and R2 in (58) sum up to iixiXal^ii -^2), RxxX2{D'\, D'2) is (Di, D2)— achievable. 

The minimal Rq satisfying (58) is exactly C{Xi,X2), which is achieved by letting {D'^,D'2) = {a\,a\). 



(58) 



B. Gaussian random variables 

In this section we consider bivariate Gaussian random variables Xi, X2 with zero mean and covariance matrix 



K2 = 



erf pcri0-2 
^2 



(59) 



pa\a2 a-2 

The common information between this pair of Gaussian random variables is given in the following theorem. 
Theorem 8: For two joint Gaussian random variables Xi , X2 with covariance matrix K2, the common information 

is 



C(Xi,X2) = ilogi^. 

2 1 — p 



(60) 



Proof: See Appendix F. ■ 
As the common information of {Xi, X2) is only a function of the correlation coefficient p, we consider, without 
loss of generaUty, the covariance matrix 

1 P 
P 1 

The above result generaUzes to multi-variate Gaussian random variables satisfying a certain covariance matrix 
structure, the proof of which can be constructed in a similar fashion. 



K'2 



(61) 
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Corollary 1: For N joint Gaussian random variables Xi,X2, ■ ■ ■ , with covariance matrix K^, 

I p ■■■ p 
p 1 ■■■ p 



K 



N 



P P 



1 



(62) 



the common information is 



C(X.) = ilog(l + ^). 



(63) 



We now characterize the minimum connmon rate Cs{Di, D2) in the Gray-Wyner lossy source coding network 
for bivariate Gaussian random variables with covariance matrix K2 in equation (61). It was shown in [17] that for 
symmetric distortion, i.e.,Di = D2 = D, 

C{Xi,X2), 0<D<l-p, 



Cz{D,D) 



(64) 



Rx,xAD,D), l-p<D<l, 
0, D>1. 
We characterize C3{Di, D2) for general distortion {Di,D2) in the following proposition. 

Proposition 3: For bivariate Gaussian random variables Xi , X2 with zero mean, covariance matrix K2 and 
squared error distortion, we have that 

' C{Xi,X2), {Di,D2)€Vw, 
C3{Di,D2)=\ Rx,xADi,D2), Pl,7^2) eI?2Ul?3, 

0, (Dl,i?2)>(l,l), 



(65) 



C{XuX2) < C^{Di,D2) < Rx,xADi,D2), (Di, £(2) e Pn, 



(66) 



where 



(67) 



Pio = {(Di,D2) :0< A<l-P,i = l,2}, 
2?n = 2?fon{pi,D2):-Di+^2-I)i-D2<l-p'}, 
V2 = 2?Jonl?fin{pi,D2):min{i5^,iESt} >P'}, 

P3 = i?foni?Jini?in{pi,z?2):A<i,« = i,2}. 

Proof: The joint rate distortion function for Gaussian random variables with squared error distortion [28]-[30] 
is given by 



Rx,xADi,D2) = { 



5 log 7 ]:'' A^ ' {Di,D2)GV2, (68) 



5 log; 



2-& min{Di,r>2} ' , £'2) € I^S , 

where Pi = r>io U Vn. The marginal rate distortion function for Xi ~ AA(0, = 1, 2, is 



_ j iiog;^, o< A < 1, 



A > 1. 
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Therefore, RxADi) + RxAD2) - I{Xi;X2) = Rx^xADi,D2), for (£'i,-D2) e From Lemma 5, for 

C3{Di,D2)>C{Xi,X2). 

On the other hand, the random variable in the following decomposition of Xi and X2 achieves the common 
information 

Xi = ^pW + VWiVi, 1=1,2. (69) 

where W, N\ , N2 are mutually independent standard Gaussian random variables. The conditional distribution of X 
given W is Gaussian distribution with variance 1 — p. Hence, for i = 1, 2, the conditional rate distortion function is 

f ilogi^, < A < 1 - p, 
[0, A > 1 - P- 

The condition Rx,\w{Di)+Rx2\w{D2)+I{Xi,X2; W) = iJ^iXsC-Di, A) is satisfied for (£>i, A) e 2?io. From 
Theorem 5, C3(£>i, A) < C(Xi, X2) for (-Di, £'2) e Cic Since, Die e Pi, we proved that for {Di, A) e Pio, 

C3(A,A) = C(Xi,X2). 

For (£>!, A) € A. it was shown in [30] that {Xi,X2) that achieves RxiX2{Di,D2) satisfies 

Hence, using the characterization C*{Di, D2), it is easy to show that the W satisfying the Markov chains (38) and 
(39) must satisfy two Markov chains 

X1X2 -X^-W-X2, 

XIX2-X2-W-XI. 

Therefore, we have 

I{X,,X2;W) = I{XuX2;X,) = I{XuX2;Xi,X2), 

which proved C3(r'i, A) = i^XiJCal-Di, A). 

The region D3 is a degenerated one. For example, RxiX2iDi,D2) = Rxi{Di) if {ZdI < P^^ means that 
the correlation between Xi and X2 is so strong that the optimal coding scheme is to encode Xi to within distortion 
Di and ignore X2. Then X2 can be estimated from Xi. We have 

X2 = pXi. 

The case of {Znl < dealt with similarly. Hence, we have C3(2?i, A) = ^XiXsl^^ij A)- ■ 
The characterization of C3{Di, D2) is plotted in Fig. 6 as a function of the distortion constraints. C3(£)i, £'2) = 

C{X\,X2) in the shaded region. 
Remarks: 
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l-p I-P" 1 D, 

Fig. 6. The distortion regions I'io,X'ii,X>2 and for bivariate Gaussian random variables. 03(1)1,1)2) = C{Xi, X2) in the shaded 
region. 



• Similar to the binary case, the claim C3{Di, D2) = C{Xi,X2) for {Di,D2) € ©10 can also be proved 
using Theorem 7. This is because for the bivariate Gaussian random variables with covariance matrix K2, 
RxiXii^ — — p) is achieved by the backward test channel Pb{xi,X2\w) = p{xi\w)p{x2\w), {Xi, X2) is 
successively refinable for any (Di, D2) < (1 — p, 1 — p) [30] and Xi is successively refinable for Di < 1 — p, 
i = l,2 [27]. 

• Similarly, C3{Di, D2) > C(Xi, X2) for {Di, D2) e ©11 but the exact characterization of Cs{Di, D2) remains 
unknown in this region. 

. Let {Di,D2) < {D[,D'2) < (1 - p, 1 - p), then the rate RxiX2{D'i, D'2) is (Di, D2)- achievable in the 
Gray- Wyner network, i.e., Rx,X2{D[,D'2) > Cs{Di,D2). 

This is because for {D[, D'2) G £10, the joint rate distortion function RxiX2 {D'l, D'2) is achieved by Gaussian 
distributed {Xi,X2) satisfying Xi — Xi — X2 — X2 where the covariance matrix of {Xi,X2) is [30] 

1-D[ p 
p 1-D'^ 

Then for (£>!, £'2) < (-Di, £'2) < (1 - P. 1 - p)> let the rate allocation of Rq, Ri, R2 for the Gray- Wyner 
network be as follows: 

Ro = Rx,x.AD[,D',)^^\ogjjr^ 

= Rx,\X,X^D^) - Rx^\X^D^) = i log § , i = 1,2. 

Ro,Ri and R2 in (71) sum up to RxiX2{Di,D2), so RxiX2{D'i, D'2) is (Di, D2)— achievable. 



^X\X2 
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Therefore, in the Gray-Wyner network, we can use the rate allocation in (71) to achieve the distortion 

{Di,D2) < (1 - p, 1 - p) for any (Di,I?2) < {D'l.D'^) < (1 - p, 1 - p). The minimal Ro satisfying 
(71) is exactly C{Xi,X2), which is achieved by letting {D[,D'2) = (1 — p, 1 — p). 



We have generaUzed the definition of Wnyer's common information and expanded its practical significance by 
providing a new operational interpretation. The generalization is two-folded: the number of dependent variables can 
be arbitrary, so are the alphabet of those random variables. We have determined new properties for the generalized 
Wyner's common information of N dependent variables. More importantly, we have derived a lossy source coding 
interpretation of Wyner's connmon information using the Gray-Wyner network. In particular, it is estabUshed that 
the common information is precisely the smallest common message rate when the total rate is arbitrarily close to the 
rate distortion function with joint decoding. A surprising observation is that such equality holds independent of the 
values of distortion constraints as long as the distortions are within some distortion region. Two examples, the doubly 
symmetric binary source under Hamming distortion and bivariate Gaussian source under square-error distortion, are 
used to illustrate the lossy source coding interpretation of Wyner's common information. The common information 
for bivariate Gaussian source and its extension to the multi-variate case has also been computed explicitly. 

While the lossy source coding interpretation of Wyner's common information presented in this paper is Umited 
to N = 2 random variables, the results can be extended to arbitrary N random variables. 
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A. Proof of Theorem 5 

We first show that C3(£>i , D2) > C{Dx ,02). Let Rq be {Di , £)2)-achievable, then there exists an (n, Mq, Mi , M2) 
code such that (32)-(34) are satisfied. Define Ri = ^ logMj for i = 1,2. Since {Ro, Ri, R2) is (Di, £>2)-achievable, 
from Theorem 2, there exists a W such that 



VI. Conclusion 



Appendix 



Ro > I{X^,X2;W), 



Ri > Rxi\w{Di), 



i = 1,2 



and for any e > 0, 



2 




(72) 
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Therefore, 

2 

Rx,X2{Di,D2) + e > 

2 

>I{XuX2;W)+^Rx,iw{Di) 



>I{X^,X2;W) + Rx,x,\w{Di,D2) (73) 

>Rx,xADi,D2) (74) 
where (73) is from (15b) and (74) comes from (14b). Thus, we have 

I{Xi,X2;W)+Rx,\w{Di) + Rx,\wiD2) = Rx,x,{DuD2). (75) 



Hence, if Rq is (Di, Z)2)-achievable, there exists a W such that Rq > I{Xi,X2; W) and (75) is true. It shows 
that C3{Di,D2) > C{Di,D2). 

Next we show 0^(01, D2) < C{Di,D2). Let W be the random variable that achieves C{Di,D2). For any 
Ro > C{Di,D2) and e > 0, let 

ei =min{|,iio-C(l?i,l?2)}, (76) 

and hence ei > 0. From theorem 2, there exists an (n, Mq, Mi, M2) code with Xi) < Di, Ed2{X2, X2) < 

D2, and 

-logMo < I{X,,X2;W') + ei=C{DuD2) + ei<Ro, (77) 
n 

^ log Mi < + (78) 

for i = 1, 2. Sum over (77) and (78), we get 

2 ^ 2 
^- log Mi < /(Xi,X2;V^') + I^^^,Hy'(A) + 3ei 

1=0 j=l 

< iixix.(£'i,£'2) + e, (79) 

where inequaUty (79) comes from (76) and definition of C{Di, D2). 
This proves that Rq is (Di, £)2)-achievable, thus completes the proof of Ci{Di,D2) < C{Di,D2). 

B. Direct proof of C{Di,D2) =C*{Di,D2) 
First we show that C{DuD2) > C*(£>i,Z)2). 

Let W be the variable that achieves C{Di,D2) and let Xi,X2 be random variables that achieve Rx^iwi^i) 
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and Rx2\wiD2), i.e., 

I{X„X2;W) + Rx,iw{Di)+Rx,\w{D2) = Rx.xADuD^), (80) 

Rx,\w{Di) = I{Xi-X,\W), (81) 

Rx,\wiD2) = I{X2;X2\W), (82) 

E[diiXi,Xi)] < (83) 

E[d2{,X2,X2)\ < D2. (84) 

Without loss of generaUty, we can assume that the joint distribution of [Xi ,X2,Xi,X2,W) factors &sp{x\,X2,x\,X2-, 
p{x\,X2-iW)p{x\x,w)p{y\y,w) because the distortion D\ is independent of X2 and D2 is independent of X\. We 
now establish 

RxiX2\w{Di,D2) = Rx^\w{Di) + Rx2\w{D2)- 
This is from (80) and the inequahties 

Rx,x,\w{D^,D2) + I{X^,X2;W) > Rx,xADi,D2), 

Rx,\w{Dx) + Rx,\w{D2) > Rx,x,\w{Di,D2), 
from Lemma 1 . Therefore, together with (80)-(84), we have 

Rx,x,\w{DuD2) = I{X,;Xi\W) + I{X2;X2\W) 

= H{Xi\W) + H{X2\W) - H{Xi\Xi,W) - H{X2\X2, W) 

> HiXuX2\W) - H{Xi\Xi,W) - H{X2\X2, W) 

= H{X,,X2\W)- H{Xi \W,X^,X2)-H{X2\ W, Xi , X2) 
= I{X,,X2;Xi,X2\W) 

> Rx,x,\w{Di,D2). 

As the left-hand side (LHS) and right-hand side (RHS) of the above inequalities are the same, all the inequalities 
must be equaUties so we have 

I{Xi;X2\W)=0. 

Then we have 

Rx,xADi,D2) = I{Xi,X2;W)+I{Xi;Xi\W) + I{X2;X2\W) 

= IiXuX2;W,Xi,X2)-I{Xi,X2;Xi,X2\W) + I{Xi;Xi\W)+I{X2;X2\W) 
= I{Xi,X2;Xi,X2) + I{Xi,X2;W\Xi,X2) 

> I{Xi,X2;Xi,X2) 

> Rx,xADi,D2). 
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As the LHS and RHS of the above inequahties are the same, all the inequahties must be equahties so we have 

I{Xi,X2;W\Xi,X2) = 0, 

I{X,,X2;Xi,X2) = Rx,xADuD2). 

Therefore, Xi, X2, Xi, X2,W satisfy the Markov chains in (38) and (39) and Xi,X2 achieve RxiX2iDi, D2). 
Thus, C{Di,D2) > C*{Di,D2). 

Next we show that C{Di,D2) < C* {0^,02). 

Let Xi,X2,Xl,X^,W achieve C*{Di,D2). Therefore, they satisfy the Markov chains in (38) and (39) and 
I{Xi,X2;X^,X^) = Rx,xADi,D2) and E[diiXi,Xl)] < D^, E[d2{X2, X^)] < D2. 

RxiX2{Di,D2) = I{Xi,X2;X^,X2) 

= I{X,,X2;W,XIX^) (85) 

= I{XuX2;W) + I{XuX2;X*,X*\W) 

= I{Xi,X2;W) + H{x;\W)+H{X*\W)-H{XlX*\Xi,X2,W) (86) 

= I{Xi,X2; W) + I{X,;X^\W) + /(X2; X;\W) + H{X^\Xi, W) 
+H{X*\X2, W) - H{X*,X*\Xu X2, W) 

> I{Xi,X2;W) + I{Xv,X^\W) + I{X2;X^\W) + H{X^\XuX2,W) 

+H{x;\Xi,X2,w) - H{xix;\Xi,X2,w) m 

= /(Xi, X2; W) + I{X,;X^\W) + I{X2; X;\W) + I{X*; X;\X,,X2, W) 

> I{Xi,X2;W) + I{Xv,X^\W) + I{X2;X^\W) 

> I{X,,X2;W) + Rx,\w{Di) + Rx2\w{D2) 

> IiX,,X2;W) + Rx,x2\w{Di,D2) (88) 

> Rx,xADuD2), (89) 

where (85) is from the Markov chain (Xi, X2) - (Xi*, X|) - W, (86) is from the Markov chain X^-W- X^, 
(87) is because conditioning reduces entropy, (88) and (89) are by the properties of rate distortion functions. As 
the LHS and RHS of the above inequalities are the same, all the inequalities must be equahties so we have 

I(Xi,X2; W) + Rx,\w{Di) + i?x.|H^p2) = -Rx^x.^i, ^2). 
Therefore, C*{Di,D2) = I{Xi,X2;W) > C{Di,D2). 

C. Proof of Lemma 5 

Let W be the random variable that achieves C3(£>i, D2). Thus, C3(£>i, D2) = /(Xi, X2; W) with 

Rx,\w{Di) + Rx2\w{D2) + I{X^,X2;W) = Rx,X2{Di,D2). (90) 
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Combined with (42), we have that 

RxADi) + RxAD2)-I{Xi;X2) = Rx,\w{Di) + Rx,\w{D2) + I{X,,X2;W) (91) 

> Rx, (£>!) - I{X,; W) + Rx,{D2) - I{X2; W) 
+I{X,,X2;W) (92) 

= RxADi) + RxAD2)-I{Xi;X2)+I{Xr,X2\W) (93) 

> RxADi)+Rx,{D2)-I{Xi;X2), (94) 

where equation (91) is from equations (90) and (42), inequality (92) comes from Lemma 1, (93) is by the chain 
rule and inequality (94) is by the fact that I{Xi; X2\W) > 0. 

Because the LHS of (91) is the same as the RHS of (94), we can conclude that all the inequalities above should 
be equalities. This impUes I{Xi;X2\W) = 0. Therefore, 

C3{Di,D2)>C{Xi,X2). 

D. Proof of Theorem 6 

Let W be the random variable that achieves the common information of Xi,X2- By Lemma 2, there exists a 
strictly positive surface V{XiX2\W) such that for any < (£'i,-D2) < V{XiX2\W), 

I{XuX2;W) + Rx,x,\w{Di,D2) = Rx,xADuD2). (95) 

Also by Lemma 2, there exists a strictly positive surface 'D{XiX2) > 'D{X-iX2\W) such that for any < 
{D,,D2)<V{X,X2), 

RxADi) + RxAD^) - I{Xi;X2) = Rx^xADi,D2). (96) 

Since V{XiX2\W) < V{XiX2), let 7 = V{XiX2\W), both equalities (95) and (96) hold for < (fi, -D2) < 7- 
Therefore, from Lemmas 4 and 5, C3(£'i,-D2) = C{Xi,X2) for < (£'i,-D2) < 7- 

E. Proof of Theorem 7 

First we show that for any (£>!, -D2) < -D^), 

Rx,x,\w{DuD2) + I{X^X2;W) = Rx,xADi,D2). (97) 
From the definition of {D^,D^) in (45), we have 

Rx^xAD^Dl) > RxADD + RxADl) - I{Xr,X2) = I{XuX2;W), 
where the first inequaUty is from (14c). On the other hand, 

Rx,xM,Dl) < /(Xi,X2;l?,X°) < I{X,,X2;W). 
Therefore, Rx.x.iD^D^) = I{X,,X2;X°,X°) = I{X,X2;W). 
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Let {Xi,X2) achieve RxiX2{Di,D2). As the vector source {Xi,X2) is successively refinable under individual 
distortion constraints [30], we have the Markov chain X1X2 — X1X2 — -^1-^2 ■ Therefore, 

Rx,x,iDi,D2)-I{Xi,X2;W) = I{Xi,X2;Xi,X2) - I{Xi,X2;X°,X°) 

^ -RxiX2|xri,Xo(-^l'-02) 

> Rx,x,\wiDi,D2), 

where the last inequality is from the Markov chain X1X2 — W — X^, X2. On the other hand, by Lemma 1, we 
have 

Rx^x,\w{Di,D2) + /(X1X2; W) > Rx,x, (£>i, D2). 

This establishes (97). Thus, from Lemma 4, C3{Di,D2) < C{Xi;X2). 
To complete the proof, we need to show 

Rx, (£>!) + Rx, {D2) - I{Xi;X2) = Rx^x, (£>!, D2). (98) 

From Lenrnia 1, 

RxADi) + RxAD2) - I{Xi-X2) < Rx,xADi,D2). 

Therefore, we only need to estabhsh the other direction. For i = 1, 2, let Xi achieve Rxi {Di), then by the definition 
of a successively refinable scalar source [27], we have the Markov chain X^ — X^ — XP for Di < D^. Therefore, 

RxAD^)-I{Xi■,W) = I{X,;X,)-I{Xf,X°) 
= I{Xi;Xi\X°) 

^ ^Xi\X°i^i) 

> RxM^i), (99) 
where (99) is from the Markov chain X^ — W — Xf. Using (99), we have 

RxADi) + RxAD2) - I{Xi;X2) > Rx,\w{Di) + I{Xv,W) + Rx,\w{Di) + /(X2; W) - /(Xi; X2) 

= Rx,\w{Di) + Rx,\w{D2) + I{X,X2;W) 

= Rx,x,\w(DuD2)+nx,X2;W) 

= RxiX2{Di,D2), 

which completes the proof. 
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F. Proof of Theorem 8 

First, we will show that the common information of Xi , X2 is only a function of the correlation coefficient p. 
To show this, let Xi = ^Xi, i = 1,2, thus Xi,X2 are joint Gaussian distributed with zero mean and covariance 
matrix 

' 1 p 
P 1 

We have the Markov chain that Xi — Xi — — X2 and by the data processing inequality for Wyner's conunon 
information [13], C{Xx,X2) < C{Xi,X2). On the other hand, we have the Markov chain that X-1-X-1-X2-X2 
and C{Xi,X2) < C{Xi,X2). Thus, C{Xi,X2) = C{Xi,X2). Without loss generality, we will consider al = 
(t| = 1 in the following. 
Let 



Xi = ./pW + ^/W7Vi, i = 1,2, 



(100) 



where W, Ni , N2 are mutually independent standard Gaussian random variables. It is clear that Xi , X2 are bivariate 
Gaussian with correlation coefficient p, 

C{X^,X2) < I{XuX2;W) = ^log[^- 

Next we will show that 

C(Xi,X2)>ilog[±^. 

For any U that satisfies the Markov chain Xi — U — X2, let Di be the minimum mean square error (MMSE) of 
estimating X-^ using U, thus, £>i = E{Xi - E{Xi\U)f. Similarly, let D2 = E{X2 - E{X2\U))'^. We now show 
that7(XiX2;f/) > ilogi±^. 

= H{XiX2)-H{Xi\U)-H{X2\U) 
= I{Xr,U) + I{X2;U)-I{Xr,X2) 
> I{X,;E{X,\U))+I{X2;E{X2\U))-I{Xv,X2) 



I{X^X2;U) 



> Rx,{Di) + Rx,{D2)-I{X^;X2) 
1 



(101) 
(102) 
(103) 



for Di < 1,D2 < 1, where (101) is from the chain rule, (102) is from the Markov chains Xi - U - E{Xi\U), 
X2 — U — E{X2\U) and (103) is by the definition of rate distortion function. 
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Next we show that Di + L»2 < 2(1 - p), Di < 1, D2 < 1. 
2(1 -p) = E{X,-X2f 

= E[Xi - E{Xi\U) + E{Xi\U) - X2]'' 

= E[Xi - EiXi\U)]^ + E[E{Xi\U) - X2]2 + 2E[{Xi - E{Xi\U)){E{Xi\U) - X2)] 

= E[Xi - E{Xi\U)]^ + E[E{Xi\U) ^ X2]'^ (104) 

= E[Xi - E{Xi\U)]'' + E[E{Xi\U) - E{X2\U) + E{X2\U) - X2]'' 

= E[Xi - E{Xi\U)]'^ + E[X2 - E{X2\U)]'^ + E[E{X2\U) - E{Xi\U)]'^ 

+E[{X2 - EiX2\U)){E{X2\U) - E{Xi\U))] 
= E[Xi-E{Xi\U)f + E[X2-E{X2\U)f + E[E{X2\U)-E{Xi\U)f (105) 
> D1+D2 
where (104) is from 

E[{X^ - E{X^\U)){E{X^\U) - X2)] = E[{Xi - E{Xi\U))E{X^\U)] - E[{X^ - E{X^\U))X2] 

= -E[{X^-E{X^\U))X2] 
= -EuxAX2Ex,iu[Xi - E{Xi\U)]] 
= -EuxAME{Xi\U)-E{X,\U))] = 0, 

and (105) is from 

E[{X2 - E{X2\U)){E{X2\U) - E{X,\U))] 
= E[{X2 - E{X2\U))E{X2\U)] - E[{X2 - E{X2\U))E{Xi\U)] = 
In addition, we have £>i = E[Xi - E{Xi\U)]^ = EXf - E[E{Xi\Uf] < EXf = 1. 



Thus, 



IiX,X2;U) > llog^ 

1 1 - p2 
> o log ■ 



2 

1 1 - p2 

2 
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